error percentage
An Experimental Evaluation of Japanese Tokenizers for Sentiment-Based Text Classification
Rusli, Andre, Shishido, Makoto
This study investigates the performance of three popular tokenization tools: MeCab, Sudachi, and SentencePiece, when applied as a preprocessing step for sentiment-based text classification of Japanese texts. Using Term Frequency-Inverse Document Frequency (TF-IDF) vectorization, we evaluate two traditional machine learning classifiers: Multinomial Naive Bayes and Logistic Regression. The results reveal that Sudachi produces tokens closely aligned with dictionary definitions, while MeCab and SentencePiece demonstrate faster processing speeds. The combination of SentencePiece, TF-IDF, and Logistic Regression outperforms the other alternatives in terms of classification performance.
Decoupled Weight Decay for Any $p$ Norm
Outmezguine, Nadav Joseph, Levi, Noam
With the success of deep neural networks (NNs) in a variety of domains, the computational and storage requirements for training and deploying large NNs have become a bottleneck for further improvements. Sparsification has consequently emerged as a leading approach to tackle these issues. In this work, we consider a simple yet effective approach to sparsification, based on the Bridge, or $L_p$ regularization during training. We introduce a novel weight decay scheme, which generalizes the standard $L_2$ weight decay to any $p$ norm. We show that this scheme is compatible with adaptive optimizers, and avoids the gradient divergence associated with $0
Android Malware Detection with Unbiased Confidence Guarantees
Papadopoulos, Harris, Georgiou, Nestoras, Eliades, Charalambos, Konstantinidis, Andreas
The impressive growth of smartphone devices in combination with the rising ubiquity of using mobile platforms for sensitive applications such as Internet banking, have triggered a rapid increase in mobile malware. In recent literature, many studies examine Machine Learning techniques, as the most promising approach for mobile malware detection, without however quantifying the uncertainty involved in their detections. In this paper, we address this problem by proposing a machine learning dynamic analysis approach that provides provably valid confidence guarantees in each malware detection. Moreover the particular guarantees hold for both the malicious and benign classes independently and are unaffected by any bias in the data. The proposed approach is based on a novel machine learning framework, called Conformal Prediction, combined with a random forests classifier. We examine its performance on a large-scale dataset collected by installing 1866 malicious and 4816 benign applications on a real android device. We make this collection of dynamic analysis data available to the research community. The obtained experimental results demonstrate the empirical validity, usefulness and unbiased nature of the outputs produced by the proposed approach.
Applied Computer Vision on 2-Dimensional Lung X-Ray Images for Assisted Medical Diagnosis of Pneumonia
Ligueran, Ralph Joseph S. D., Santos, Manuel Luis C. Delos, Tinio, Dr. Ronaldo S., Valencia, Emmanuel H.
This study focuses on the application of a specific subfield of artificial intelligence referred to as computer vision in the analysis of 2-dimensional lung x-ray images for the assisted medical diagnosis of ordinary pneumonia. A convolutional neural network algorithm was implemented in a Python-coded, Flask-based web application that can analyze x-ray images for the detection of ordinary pneumonia. Since convolutional neural network algorithms rely on machine learning for the identification and detection of patterns, a technique referred to as transfer learning was implemented to train the neural network in the identification and detection of patterns within the dataset. Open-source lung x-ray images were used as training data to create a knowledge base that served as the core element of the web application and the experimental design employed a 5-Trial Confirmatory Test for the validation of the web application. The results of the 5-Trial Confirmatory Test show the calculation of Diagnostic Precision Percentage per Trial, General Diagnostic Precision Percentage, and General Diagnostic Error Percentage while the Confusion Matrix further shows the relationship between the label and the corresponding diagnosis result of the web application on each test images. The developed web application can be used by medical practitioners in A.I.-assisted diagnosis of ordinary pneumonia, and by researchers in the fields of computer science and bioinformatics.
Artificial Intelligence in Cyber Security: Benefits and Drawbacks.
You can use artificial intelligence (AI) to automate complex repetitive tasks much faster than a human. AI technology can sort complex, repetitive input logically. That's why AI is used for facial recognition and self-driving cars. But this ability also paved the way for AI cybersecurity. This is especially helpful in assessing threats in complex organizations. When business structures are continually changing, admins can't identify weaknesses traditionally.
Revisiting Inaccuracies of Time Series Averaging under Dynamic Time Warping
This article revisits an analysis on inaccuracies of time series averaging under dynamic time warping conducted by \cite{Niennattrakul2007}. The authors presented a correctness-criterion and introduced drift-outs of averages from clusters. They claimed that averages are inaccurate if they are incorrect or drift-outs. Furthermore, they conjectured that such inaccuracies are caused by the lack of triangle inequality. We show that a rectified version of the correctness-criterion is unsatisfiable and that the concept of drift-out is geometrically and operationally inconclusive. Satisfying the triangle inequality is insufficient to achieve correctness and unnecessary to overcome the drift-out phenomenon. We place the concept of drift-out on a principled basis and show that sample means as global minimizers of a Fr\'echet function never drift out. The adjusted drift-out is a way to test to which extent an approximation is coherent. Empirical results show that solutions obtained by the state-of-the-art methods SSG and DBA are incoherent approximations of a sample mean in over a third of all trials.